A novel way of computing dissimilarities between nodes of a graph, with application to collaborative filtering and subspace projection of the graph nodes
نویسندگان
چکیده
This work presents a new perspective on characterizing the similarity be-tween elements of a database or, more generally, nodes of a weighted, undi-rected, graph. It is based on a Markov-chain model of random walk throughthe database. More precisely, we compute quantities (the average commutetime, the pseudoinverse of the Laplacian matrix of the graph, etc) thatprovide similarities between any pair of nodes, having the nice property of in-creasing when the number of paths connecting those elements increases andwhen the “length” of paths decreases. It turns out that the square root of theaverage commute time is a Euclidean distance and that the pseudoinverse ofthe Laplacian matrix is a kernel (it contains inner-products closely related tocommute times). A procedure for computing the subspace projection of thenode vectors of the graph that preserves as much variance as possible in termsof the commute-time distance – a principal components analysis (PCA) of thegraph – is also introduced. This graph PCA provides a nice interpretation tothe “Fiedler vector”, widely used for graph partitioning. The model is evalu-ated on a collaborative-recommendation task where suggestions are made aboutwhich movies people should watch based upon what they watched in the past.Experimental results on the MovieLens database show that the Laplacian-basedsimilarities perform well in comparison with other methods. The model, whichnicely fits into the so-called “statistical relational learning” framework, couldalso be used to compute document or word similarities, and, more generally,could be applied to machine-learning and pattern-recognition tasks involving adatabase.François Fouss, Alain Pirotte and Marco Saerens are with the Information Systems ResearchUnit (ISYS), IAG, Université catholique de Louvain, Place des Doyens 1, B-1348 Louvain-la-Neuve,Belgium. Email: {saerens, pirotte, fouss}@isys.ucl.ac.be.Jean-Michel Renders is with the Xerox Research Center Europe, Chemin de Maupertuis 6, 38240Meylan (Grenoble), France. Email: [email protected].
منابع مشابه
LPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring
Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...
متن کاملA novel way of computing dissimilarities between nodes of a graph, with application to collaborative filtering
This work presents some general procedures for computing dissimilarities between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markov-chain model of random walk through the database. The model assigns transition probabilities to the links between elements, so that a random walker can jump from element to element. A quantity, called the avera...
متن کاملThe topological ordering of covering nodes
The topological ordering algorithm sorts nodes of a directed graph such that the order of the tail of each arc is lower than the order of its head. In this paper, we introduce the notion of covering between nodes of a directed graph. Then, we apply the topological orderingalgorithm on graphs containing the covering nodes. We show that there exists a cut set withforward arcs in these...
متن کاملThe Application of New Concepts of Dissimilarities between Nodes of a Graph to Collaborative Filtering
This work presents some general procedures for computing dissimilarities between elements of a database or, more generally, nodes of a weighted, undirected, graph. It is based on a Markov-chain model of random walk through the database. The model assigns transition probabilities to the links between elements, so that a random walker can jump from element to element. A quantity, called the avera...
متن کاملA Stock Market Filtering Model Based on Minimum Spanning Tree in Financial Networks
There have been several efforts in the literature to extract as much information as possible from the financial networks. Most of the research has been concerned about the hierarchical structures, clustering, topology and also the behavior of the market network; but not a notable work on the network filtration exists. This paper proposes a stock market filtering model using the correlation - ba...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005